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DECLARATION UNDER 37 CFR 1,131 IN SUPPORT OF PRIOR INVENTION 



Sir: 

I, Orna Etzion, declare: 

1. I am an inventor of the claims of the above-captioned patent application ("the 
Application' 1 ) and an inventor of the subject matter described therein. 

2. Prior to June 16, 2000, the filing date of U.S. Patent No. 6,725,361 cited in an 
Office Action mailed June 01, 2004, the invention claimed in the Application had 
been conceived and reduced to practice in the United States. 
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in Support of Application 
Application No. 09/676,175 



3. Attached Exhibit A is a redacted copy of an invention disclosure form describing 
the design of the A Method and Apparatus for Generating an Expected Top of 
Stack During Instruction Translation, and establishes that the subject matter 
claimed in the Application had been reduced to practice in the United States prior 
to June 16, 2001. Exhibit A (the invention disclosure) describes the operation of 
generating an expected top of stack during instruction translation, as is described 
and claimed in our application. More specifically, the figure on page 3 of Exhibit 
A illustrates the features of claim 1. The features of "translating a first block of 
instructions executable in a first processor architecture, into a translated first block 
of instructions executable in a second processor architecture, said translated first 
block of instructions operationg with a stack of data entry positions" is shown in 
the top five blocks of the figure with the transformation of 'Code Block LF to 
'Code Block L2\ The feature of "generating an expected Top of Stack (TOS) 
position in said stack for said first block of code" is shown in the top five blocks 
of the figure by the entry condition 'Expected TOS'. The feature of "adding at 
least one instruction to said translated first block of instructions to determine if 
said first expected TOS is equal to an actual TOS at a time of executing said 
translated first block of instructions" is shown in the top five blocks of the figure 
in the translated pseudo-code sections of 'Code Block LF and 'Code Block L2\ 

4. The subejct matter claimed in the application was actually reduced to practice 
prior to June 16, 2000 because the technique claimed in Exhibit A had been 
successfully implemeted before this date, as noted on page 1, paragraph 3 in 
Exhibit A. 
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that these statements are made with the knowledge that willful false statements and the 
like so made are punishable by fine or imprisonment, or both under section 1001 of Title 
18 of the United States Code, and that such willful false statements may jeopardize the 
validity of the application of any patent issuing thereon. 

// / 

Dated: > uue. C i U . 2005 jl'/l * J l l - 

Orna Et^ion 
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cSL, I mffe/ N»o ft |e _____ 

INTEL INVENTION DISCLOSURE : JUL-7898 

DATE: July 7. 1999 



INTEL CONFjnPfrrn^ 



LEGAL ID# / U^J/J C^TMDVr^A DATE: July 7. 1999 

It is important to provide accurateanddetailed information on this form. The information will be used to evaluate your 
invention for possible filing ^^^igap^lication. When completed, please return this form to the Legal Department at 
RN4-01 . If you have any odaaions r^acdtog this form or to whom it should be forwarded, please call 765-1 369 696- 
2851 or 554-3996. / o\ 




\. Inventors): 

Name: Oma Etzion ^/ ss# N/A 



Empl. No. 10122359 ^%}§0Pff8S Phone 4865-5720 M/S: IDC-1 D 

Home Address: 5 Kariv st. Haifa. Israel 

Citizenship: Israel Supervisor* Yaron Sheffer M/S: IOC-1 D R^GElVED 

Group Name: MPL Division Name: MPG 

Jul 0 8 1993 

Name: ?_ SS# n/ A _ PATENT DATABASE GROUP 

Empl. No. Dept.# 6985 ; Phone ? M/S: IDC1 dMJEL LEGAL TFAM 

Home Address: . ' 

Citizenship: Israel Supervisor* Yaron Sheffer Phone 4865-5759 M/S: IDC-1 D 

Group Name: ML Division Name: MPG 



2. Title of Invention: A method for efficiently maintaining synchronization of a simulated circular-stack of registers during binary 
translation. 9 7 

3. Stage of development, i.e. % complete, and relation of technology to the following product/process: 

The technique has been implemented in a dynamic IA32->IA64 binary trans l ator, which is currently a research project, for floating. 
point stack simulation. C "- J — 1 ; a - 

4. (a) Has a description of your invention been, or will it shortly be, published outside Intel: 

N0: YES: X DATE WAS OR WILL BE PUBLISHED: 10/99 ■ 



If YES, was the manuscript submitted for pre-publication approval? YES: x NO: 

(b) Has your invention been used/sold or planned to be used/sold by Intel or others? 

N ° : , r , YES: % DATE WAS OR WILL BE SOLD: may be used in future implementations of IA64, not 

yet on plan of record. 

5. If invention conceived, or constructed during performance of a government or third party contract, please check here 
and give the contract name and number 



6. Please attach a page to this form, DATED AND SIGNED BY ONE INVENTOR (PREPARER), to provide an abstract 
of your invention, and include the following information in your abstract: 

(a) State general purpose(s) of your invention; 

(b) Describe advantage(s) of your invention over what is done now; 

(c) Describe essential element(s) or key to your invention; and 

(d) Value of your invention to Intel (how will it be used?). 

*HAVE YOUR SUPERVISOR READ, DATE AND SIGN COMPLETED FORM 
DAT * 2 SUPERVISOR: Yaron Sheffer 

BY THIS SIGNING, I (SUPERVISOR) ACKNOWLEDGE THAT I HAVE READ AND UNDERSTAND THIS 
DISCLOSURE, AND RECOMMEND THAT THE HONORARIUM BE PAID/ 
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INTEL CONFIDENTIAL 

General purpose of the invention 

The purpose of this invention is to efficiently maintain synchronization of a simulated circular register stack. The invention 
may be valuable for binary translation, from source computer architecture that contains such a stack, to a target 
architecture that supports a flat register file. The invention may be used in dynamic or static binary-translators, as well as 
in architectural simulators or virtual-machine implementations using similar, code-generation-based, techniques. In 
particular, the invention provides a significant performance advantage when translating Intel Architecture floating-point 
code to any other architecture. 

Advantages of the invention over what is done now 

The invention is significantly faster than any known alternative. 

Emulating a stack rotation by multiple move operations need to perform those moves for any stack push or pop. The 
number of the required moves per occurrence is the size of the stack, and they contain a lot of internal dependencies, 
the proposed invention, the rotation moves are performed only on extremely rare cases. 
Emulating a stack in memory suffers from a great load-store overhead, which the proposed invention avoids. 

Essential elements or key to the invention 

The following section demonstrates the key elements of the invention using, as an example, an IA32-»IA64 binary 
translator. The relevant aspect is the emulation of IA32 floating-point (FP) register stack, using the flat FP register-file of an 
IA64 target machine. 

References to the eight physical FP-registers of the Intel IA32 architecture are always stack-relative. The mapping 
. between stack-relative references and physical registers changes dynamically. For example, the physical registers 
corresponding to ST(0) before and after executing an FLD instruction are different, since FLD pushes a value onto the FP- 
stack. 

v However, in the vast majority of practical cases, multiple run-time entries to the same code block repeat the same stack- 
depth at entrance. Speculating the state at the entry point allows an effective static mapping between any IA32 FP 1 
register-references in the block and the corresponding IA64 FP-registers. To take advantage of such a speculative 
approach, the following mechanisms are supported: 

1. Stack depth speculation - effectively guessing the run-time stack state at all or almost all entries to the block. The 
speculation is done prior to the block translation. Dynamic translator uses the 1 * run-time entry state (which is already 
known when the block is reached). Static translator has to perform code analysis and walk-through to predict the 
entrance state effectively. 

2. Tracking the speculation realization - keeping the actual run time stack state and verifying that the speculative 
assumption (taken at the translation of the block) is indeed true at each run-time entry. The actual stack depth is 
updated at the end of the block execution, which is a single operation that reflects the overall effect of the entire block. 
If the block is balanced (same number of pushes and pops), this code is eliminated. At the beginning of each block, a 
checking code is executed, that compares the assumed (speculated) stack depth with the actual one. 

3 Recovery mechanism - ensure correct operation when the check fails. The recovery is achieved by actual rotation 
(copy of register values), so the actual top-of-stack moves to fit the expected one. The block code remains as is. This 
method of recovery ensures thgjj he penalty does not propaga te: When control is transferred to the next block, the 
correction is already done, and thestack-deptn expected by the next block matches the actual depth. 

rttote- This invention disclosure does not describe how stack exception conditions are detected. The solution to that 

■problem is covered by another patent disclosure. 

Example 

The example in the following page consists of 2 very simple floating-point blocks. It shows the behavior of the translation 
mechanism at the regular case (when the expected Top-Of-Stack equals the actual one), and on the special case (when 
they are different). Note that L2 block is balanced, hence no update of the actual TOS value is done at its epilogue. Also 
note that the correction done for L1 (on the special case) does not affect the normal flow at L2. The Actual TOS value is 
b est held i n a global integer register (but not necessari ly). 

As already stated, although the example refers to IA32-»IA64 translation, the invention principles are applicable to any 
other case of emulating a rotating stack by a static register file. 



Value of the invention to Intel: how will it be used? 

This invention is valuable to Intel because it can be use to significantly speed up the floating-point performance of 
IA32-»IA64 dynamic binary translation. Such a project currently exists as a research project, but the technology is 
expected to eventually enter a commercial product of strategic importance to Intel. 
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Entry conditions: 


Expected TOS = 5 


Actual 


TOS 


-5 


Source 


Value 


Target 


ST(2) 


C 


t27 


ST(l) 


B 


£26 


ST(0) 


A 


£25 


ST(7) 


• 


£24 


ST(6) 


• 


£23 


ST(5) 


• 


£22 


ST(4) 


* 


£21 


ST(3) 


* 


£20 



Code Block LI 

&U£C£ 

LI: FMULP ; //pop 
JMP 12 

Translated pseudo-code: 
Ll: Cmp 5, Actual TOS 
NE ? BR Correct 
£26 - £26 * £2S 
Actual TOS - 6 
BR L2 



After Ll execution: 


Expected TOS 


-6 


Actual 


TOS 


-6 


Source 


Value 


Target 


ST(l) 


C 


£27 


ST(0) 


AB 


£26 


ST(7) 


* 


£22 


ST(6) 


• 


£24 


ST(5) 


• 


£23 


ST(4) 


• 


£22 


ST(3) 


* 


£21 


ST(2) 


* 


£20 



Code Block Ll 

Source: 

L2: faodp / //pop 
FLOE (eax]/ //push 
JMP L3 

Translated pseudocode: 
L2: Cmp 6, Actual TOS 
NE ? BR Correct 
£27 - £26 ♦ £27 
fide £26 - [c20] 
BR L3 



After L2exeention; 


Expected TOi '6 


Actual 


TOS - 




Source 


Value 


Target t 


ST(1) 


ab-k: 


127 [ 


ST(0) 


X 


£26 


ST(7) 




£25 


ST(6) 




£24 


ST(5) 




£23 


ST(4) 




£22 


ST(3) 




£21 


ST(2) 




£20 



Entrv conditions: 


Expected TOS - 5 


Actual 


TOS 


-4 


Source 


Value 


Target 


ST(3) 


D 


127 


ST(2) 


C 


£26 


ST(1) 


B 


£25 


ST(0) 


A 


£24 


ST(7) 


• 


£23 


STXO 


* 


£22 


STX5) 


* 


£21 


ST(4) 


• 


£20 



Code Block Ll 
Source: 

Ll: FMULP ; //pop 
JMP L2 

Translated pseudo-code: 
Ll: Cmp S» Actual_TOS 
HE ? BR Correct 
£26 - £26 * £25 
Actual TOS - 6 
BR L2 ~ 



Correction pseudocode 
Delta • Expected TOS- 

Actual_TOS 
Rotate_3tack (Delta) 
Return" (to Ll) 



After correction code: 
Expected TOS « 5 
Actual TOS -5 



Code Block Ll 

Source; 

Ll: FMULP ; //pop 
JMP L2 

Translated pseudo-code: 
Ll: Cmp 5, Actual TOS 
NE ? BR Correct 
£26 - £26 * £25 
Actual TOS - 6 
BR L2 ~ 



After Ll execution: 


Expected TOS 


-6 


Actual 


TOS 


= 6 


Source 


Value 


Target 


ST(I) 


C 


£27 


ST(0) 


AB 


£26 


ST(7) 


* 


£25 


ST(6) 


• 


£24 


ST(5) 


• 


£23 


ST(4) 


* 


£22 


ST(3) 


* 


£21 


ST(2) 


0 


£20 



Code Block L2 

Source: 

L2: FADDP ; //pop 
FLDE I eax]; //push 
JMP L3 

Translated pseudo-code: 

L2: Cmp 6, Actual TOS 
NE ? BR Correct 
£27 - £26 ♦ £27 
fide £26 - U20] 
BR L3 



source 


Value 


Target 


ST(2) 


C 




£27 


STC1) 


B 




£26 


ST(0) 


A 




£25 


ST<7) 


• 




£24 


ST(<) 


• 




£23 


ST(5) 






£22 


ST(4) 


• 




£21 


ST(3) 


D 




£20 




After L2 execution: 


Expected TOS - 


6 


Actual 


TOS- 


6 


Source 


: Value 


Target 


ST(1) 


AB+C 


£27 


• ST(0) 


X 




£26 


ST(7) 


• 




£25 


ST(6) 


* 




£24 


ST(5) 


* 




£23 


ST(4) 


* 




£22 


ST(3) 


• 




£21 


ST(2) 


D 




£20 
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